Skip to content

feat: external MCP connectors via AgentCore Identity#174

Merged
philmerrell merged 35 commits intodevelopfrom
feature/connectors
Apr 27, 2026
Merged

feat: external MCP connectors via AgentCore Identity#174
philmerrell merged 35 commits intodevelopfrom
feature/connectors

Conversation

@philmerrell
Copy link
Copy Markdown
Contributor

Summary

Replaces the in-house OAuth flow for external MCP tools with AgentCore Identity end-to-end, and switches the per-turn consent gate from a pre-flight short-circuit to mid-turn Strands interrupts so users don't have to retype their prompt after authorizing.

  • Admin path — provider CRUD now lives in AgentCore (CreateOauth2CredentialProvider + friends); our DynamoDB record keeps display/RBAC metadata only.
  • User path — new Settings → Connectors page lets users initiate / re-consent providers; /oauth-complete calls CompleteResourceTokenAuth before notifying the opener so the vault doesn't stay empty.
  • Agent pathOAuthConsentHook (BeforeToolCallEvent + AfterToolCallEvent) gates tool execution. First call → AgentCore vault hit or interrupt with consent URL. Stale token after a provider-side revoke → AfterToolCallEvent detects the 401, marks the user/provider for force_authentication, sets retry=True, and the next BeforeToolCallEvent raises a fresh interrupt.
  • Resume protocoloauth_required SSE events now carry interruptId; the frontend snapshots the last turn's payload and replays it with interrupt_responses after the popup closes, so the agent picks up the same turn without a retype.

Test plan

  • Admin: register + edit + delete an OAuth provider in /admin/oauth-providers; confirm AgentCore credential provider mirrors changes.
  • User: connect / reconnect a provider from Settings → Connectors and verify popup → finalize → "Connected" badge.
  • Agent (cold path): with no consent given, send a prompt that triggers an OAuth-gated tool; consent banner appears, popup → consent → tool runs and answer streams in the same turn.
  • Agent (stale token): revoke at the provider (e.g. https://myaccount.google.com/permissions), retrigger the tool, confirm the 401-detected reauth path fires (AfterToolCallEvent → retry → fresh consent → resume).
  • Backend unit tests: uv run python -m pytest tests/agents/main_agent/integrations tests/agents/main_agent/session/test_oauth_consent_hook.py tests/agents/main_agent/integrations/test_oauth_token_cache.py — should be green.
  • Frontend unit tests: npm run test:ci — should be green.

🤖 Generated with Claude Code

philmerrell and others added 13 commits April 22, 2026 10:55
…middleware

First phase of the Connectors refactor, which will eventually replace the
bespoke OAuth token store (OAuthTokenRepository, KMS-encrypted DynamoDB,
Secrets Manager client credentials, manual refresh) with AgentCore Identity's
managed token vault and credential providers.

- AgentCoreContextMiddleware copies the four Runtime headers
  (WorkloadAccessToken, OAuth2CallbackUrl, session ID, request ID) into
  BedrockAgentCoreContext on every invocation. Required because the Inference
  API is a plain FastAPI app rather than BedrockAgentCoreApp, so the SDK does
  not populate the context for us. No-op when headers are absent, so local
  development and unit tests continue to work without mocks.

- AgentCoreIdentityClient wraps IdentityClient.get_token() with a narrower,
  platform-friendly surface for USER_FEDERATION (3LO) flows. Surfaces the
  "user consent required" case as a structured TokenResult(authorization_url=...)
  rather than an exception, so it can flow through the existing SSE stream as
  a new event type in a later phase.

Both modules are pure additions; no existing code path calls them yet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the Runtime context middleware into the Inference API and swaps the
external MCP client's token source from the bespoke OAuthService to
AgentCore Identity's USER_FEDERATION flow.

- main.py: installs AgentCoreContextMiddleware so WorkloadAccessToken and
  OAuth2CallbackUrl Runtime headers populate BedrockAgentCoreContext on every
  invocation.

- external_mcp_client.py: _get_oauth_token now returns a TokenResult from
  AgentCoreIdentityClient instead of a decrypted token string from
  OAuthService. Scopes are read from the platform's OAuth provider record so
  organizations can change them without code. When the SDK signals that user
  consent is required, the authorization URL is stashed per-user for the
  inference route to surface via an oauth_required SSE event (emitter to
  follow in a subsequent commit). load_external_tools skips client creation
  on consent-required rather than creating a client that would fail at the
  first request.

- Convention: the platform's provider_id is used verbatim as the AgentCore
  Identity credential-provider name. Admins register matching names via
  CreateOauth2CredentialProvider during provider setup.

The OAuthService, token vault, and encryption layer are still referenced by
unrelated code paths (admin routes, connections UI) and will be removed in
Phase 3 once the AgentCore-backed flow is validated end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rebrand the user-facing OAuth UI from "connections" to "connectors" for
consistent vernacular across the product. Folders, classes, types, and
route paths all follow the new name; the /settings/connections URL
redirects to /settings/connectors. The backend /oauth/connections
endpoint is preserved as a stable contract and translated at the
service layer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wraps bedrock-agentcore-control for admin-side OAuth2 credential provider
CRUD: create/update/delete/get with vendor mapping (Google/Microsoft/GitHub
to their native vendors; Canvas/Custom routed through CustomOauth2 via an
OIDC discovery URL or explicit authorization-server metadata). Domain
errors map 404/conflict/invalid-custom to typed exceptions so route
handlers can translate cleanly.

Update is intentionally non-partial: AgentCore's UpdateOauth2CredentialProvider
requires a full oauth2ProviderConfigInput and Get never returns the stored
client_secret, so credential rotation always re-submits both clientId and
clientSecret.

17 unit tests cover every vendor path, error mapping, and the Custom-only
discovery rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Create/Update/Delete/Get/List on bedrock-agentcore OAuth2 credential
providers to the app-api task role, scoped to the default token vault.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Deletes the legacy 3LO dance that predates AgentCore Identity — the
per-user token vault, PKCE-based authorization service, encryption layer,
token cache, user-facing /oauth/* routes, and the tool-side OAuthToolService.
AgentCore Identity owns the token vault and consent flow now; the inference
path already routes through agentcore_identity.py via the recent external
MCP client refactor, so these modules had no live consumers.

Also slims shared/oauth/__init__.py to the surviving surface (provider model,
repository, registrar) and unwires the user-facing router from app_api/main.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AgentCore Identity owns the clientId, clientSecret, endpoint config, and
callback URL. Our DynamoDB record keeps only the admin metadata (display
name, scopes, role gates, icon) plus cached pointers to AgentCore's record
(credential_provider_arn, callback_url) for convenience.

Drops authorization_endpoint, token_endpoint, authorization_params,
userinfo_endpoint, revocation_endpoint, pkce_required, OAuthUserToken, and
the user-side connection DTOs — all artifacts of the retired in-house flow.
Adds oauth_discovery_url and authorization_server_metadata for Custom/Canvas
providers, gated by a pydantic validator.

Repository surface tightens to put_provider + apply_metadata_update; the
Secrets Manager write/read path is gone. Admin routes (commit next) own
the AgentCore round-trip and hand a fully-formed record to the repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
POST now calls the registrar first and, on success, upserts the metadata
record in DynamoDB. If the DB write fails after AgentCore has accepted
the credentials, we best-effort delete the AgentCore provider to avoid
orphans.

PATCH distinguishes metadata-only edits (scopes, roles, display name,
icon, enabled) from credential rotation. Rotation requires clientId +
clientSecret together — partial updates are rejected by AgentCore's
UpdateOauth2CredentialProvider contract.

DELETE removes the AgentCore provider first (which revokes every user
token stored in its vault), then the local record. Pre-existing connection-
count checks are dropped since per-user tokens no longer live in our DB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Admin side:
- Rename admin/oauth-providers → admin/connectors (file + route); old
  route path redirects for URL stability
- Rewrite the admin model to the AgentCore-owned shape: drop endpoint
  fields, authorization_params, pkce_required, userinfo/revocation
  endpoints. Add credential_provider_arn, callback_url, and
  oauth_discovery_url / authorization_server_metadata for Custom vendors
- Rewrite the admin form: preset picker simplified to display metadata
  only, Custom requires an OIDC discovery URL, credential rotation
  requires clientId + clientSecret together (AgentCore's update API is
  not partial), success screen after create displays the AgentCore
  callback URL with a copy button so the admin can paste it into the
  vendor console, edit mode shows the callback URL + ARN read-only

User-facing retirement:
- Delete settings/connectors (user "my connected accounts" page),
  settings/oauth-callback (legacy 3LO return handler), and the sidebar
  + route entries for them. AgentCore Identity owns the consent flow
  at runtime via the existing /oauth-complete landing page

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When an external MCP tool needs OAuth consent, AgentCore Identity returns
an authorization URL instead of a token. This wires that signal all the
way to the user:

Backend:
- Inference route drains pending consent URLs from the external MCP
  integration after the agent stream finishes and emits one
  oauth_required SSE event per provider before done
- IAM grants bedrock-agentcore:GetResourceOauth2Token on the runtime role
  so the AgentCore Identity client can reach the token vault
- CLAUDE.MD + SSE_ERROR_MESSAGING.md document the new event

Frontend:
- Stream parser recognizes oauth_required and surfaces it as an
  OAuthRequiredEvent
- New /oauth-complete landing page handles the AgentCore callback
  redirect and postMessages consent completion to the opener tab
- OAuthConsentService orchestrates popup opening + postMessage receipt
- OAuthConsentBanner renders the Connect button inside the chat input
- chat-http and assistant preview pass OAuth2CallbackUrl header so
  AgentCore Runtime knows where to return after consent

Also updates the admin Tool form reference from /admin/oauth-providers
to /admin/connectors to match the renamed admin surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…izer

Adds the Settings → Connectors page so users can browse and connect
OAuth-backed external tools end-to-end:

- New /connectors routers on app-api (list user-visible providers via
  RBAC) and inference-api (initiate-consent, complete-consent) — the
  inference-api side runs under the AgentCore Runtime proxy where the
  WorkloadAccessToken context is populated.
- AgentCoreIdentityClient gains a workload-token mint fallback for local
  dev (GetWorkloadAccessTokenForUserId) and appends provider_id to the
  callback URL so the landing page can dismiss the right banner.
- /oauth-complete page POSTs CompleteResourceTokenAuth back through the
  inference-api before notifying the opener, fixing the "consent
  finished but vault stayed empty" race. Uses BroadcastChannel to
  bridge popup → opener under Chrome's COOP isolation.
- New connectors settings page with a Connect / Reconnect affordance
  per provider, wired to the OAuthConsentService popup flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… interrupts

The agent used to pre-flight OAuth at tool-load time and abort the whole
turn if any provider needed consent — the user then had to retype the
prompt after authorizing. This switches to the Strands interrupt
protocol: the consent gate runs lazily before each tool call, pauses
the in-flight turn, and resumes it automatically once the user
finishes the popup.

Backend
- New OAuthConsentHook (BeforeToolCallEvent + AfterToolCallEvent).
  - BeforeToolCall: looks up the OAuth provider for the selected
    MCPAgentTool's MCPClient (no name coupling), checks the in-process
    token cache, and either lets the tool run or calls
    event.interrupt(...) with the consent URL when AgentCore Identity
    reports consent required.
  - AfterToolCall: detects 401-style failures from MCP tool results,
    marks the (user, provider) for force_authentication on the next
    fetch, and sets event.retry = True so the BeforeToolCall hook
    re-fires and triggers a fresh consent. Closes the gap where a
    provider-side revocation leaves a stale token in AgentCore's vault.
- New oauth_token_cache: per-(user, provider) tokens + force-reauth
  flags; lifecycle-managed by the hook.
- ExternalMCPIntegration always loads MCP clients with a lazy
  token_provider that reads from the cache; the pending_consent /
  drain_pending_consent dict and the route's pre-LLM short-circuit
  branch are gone.
- StreamCoordinator emits one oauth_required SSE event per pending
  interrupt before the final done event, carrying interruptId so the
  frontend can resume the same turn.
- ChatAgent.stream_async accepts interrupt_responses and forwards them
  to Strands as the resume prompt; route accepts the same on
  /invocations and skips quota + RAG augmentation on resume.

Frontend
- OAuthRequiredEvent type + validator gain interruptId; settings-page
  consent path makes interruptId optional (no agent turn to resume).
- OAuthConsentService tracks the interruptId per request and invokes a
  registered resume handler on broadcast success.
- ChatRequestService snapshots the last turn's payload and replays it
  with interrupt_responses attached when a consent completes — the
  user never retypes the prompt.

Smoke-tested end-to-end: Google revoke → whoami → 401 → AfterToolCall
detects + retries → fresh consent banner → popup → auto-resume → tool
returns greeting in the same turn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
logger.error(
"CompleteResourceTokenAuth failed for user=%s provider=%s: %s",
current_user.user_id,
body.provider_id,
Comment thread backend/src/apis/inference_api/connectors/routes.py Fixed
@philmerrell
Copy link
Copy Markdown
Contributor Author

Code Review

Large structural rewrite (+5,125 / −5,781 across 87 files) that replaces the in‑house OAuth flow with AWS Bedrock AgentCore Identity and moves OAuth consent from a pre‑flight gate to mid‑turn Strands interrupts. Architecture is cleaner (AgentCore owns secrets/vault; our DB owns metadata), the resume‑without‑retyping UX is a nice win, and tests look complete. A handful of real issues should be addressed before merging.

Critical Issues

# File Line Issue Severity
1 backend/src/apis/inference_api/connectors/routes.py 126–165 complete_consent forwards a user‑supplied session_uri to AgentCore's CompleteResourceTokenAuth without verifying the URI was issued for this user. If AgentCore doesn't strictly bind sessionUri to userIdentifier, a user who learns another user's request_uri (e.g. via browser logs, shared link) could land tokens in their own vault or complete another user's consent. Add server‑side tracking of initiated sessions, or explicitly verify AgentCore enforces this binding and document the assumption. 🔴 Critical
2 backend/src/agents/main_agent/session/hooks/oauth_consent.py 51–76 _AUTH_FAILURE_PATTERN matches partial words for invalid[_\s-]token, expired[_\s-]token, etc. with no \b boundary. A tool error whose text includes /v1/401/... or references "unauthorized" elsewhere will trigger force‑reauth and surface a spurious consent popup. Tighten to word boundaries and prefer status_code/explicit markers over regex scraping. 🟡 High
3 backend/src/apis/shared/oauth/agentcore_registrar.py 291 response.get(\"clientSecretArn\") or {} — if AgentCore ever returns this field as null or an unexpected shape, you get silent data loss or a runtime AttributeError. Add a type assertion so future API changes surface as real errors. 🟡 Medium
4 backend/src/apis/app_api/admin/oauth/routes.py 115–130 Create‑provider rollback on DB failure is best‑effort; if rollback also fails you have an orphaned AgentCore provider and no reconciliation. Either store a pending‑cleanup row to retry async, or at least emit a CloudWatch metric/alarm on the orphan case. 🟡 Medium
5 backend/src/apis/inference_api/connectors/routes.py 141–148 Inline import boto3; import os inside the handler, leaks access to the private _RUNTIME_WORKLOAD_ENV (never actually used), and constructs the boto3 client per request. Move imports to module scope and cache the client. 🟡 Medium

Suggestions

# File Line Suggestion Category
1 backend/src/agents/main_agent/integrations/oauth_token_cache.py 38 def set(...) shadows the built‑in. Rename to put or store. Style
2 backend/src/agents/main_agent/session/hooks/oauth_consent.py 237 Dedup key _oauth_reauth_attempted could collide with a future Strands field. Use a module constant with a distinctive name. Maintainability
3 agentcore_identity.py vs provider_repository.py 111 / 30 Default region inconsistency — us-east-1 vs us-west-2. Centralize in one config module. Correctness
4 backend/src/agents/main_agent/integrations/external_mcp_client.py 220 In‑process self.clients dict keyed by user_id:tool_id has no eviction. Long‑running Fargate tasks accumulate forever. Bound size (LRU) or evict on session end. Performance
5 backend/src/agents/main_agent/base_agent.py 182–189, 302–304 New ThreadPoolExecutor() per tool lookup and per external‑tool load. Reuse a module‑level executor. Performance
6 backend/src/agents/main_agent/streaming/stream_coordinator.py 204 _extract_oauth_required_events reads agent._interrupt_state — a private Strands attribute. Any SDK rename breaks OAuth silently. Add a try/except with a loud warning, or ask Strands for a public accessor. Maintainability
7 frontend/ai.client/src/app/oauth-complete/oauth-complete.page.ts 266 this.config.inferenceApiUrl().replace(/\/invocations\/?$/, '') is fragile. Add a dedicated inferenceApiBaseUrl() config or compose from a known base + path. Maintainability
8 CLAUDE.MD Filename CLAUDE.MD (not CLAUDE.md) diverges from the path used elsewhere on case‑sensitive filesystems. Pre‑existing but worth renaming. Style
9 backend/src/apis/app_api/admin/oauth/routes.py 176 "Discovery config can only be updated together with a credential rotation" — surface AgentCore's own constraint in the error message for operators. UX
10 frontend/ai.client/src/app/session/services/chat/chat-request.service.ts 104 this.lastRequestObject = { ...requestObject } is a shallow copy. Nested arrays (enabled_tools, file_upload_ids) can mutate after snapshot. Use structuredClone. Correctness

What Looks Good

  • Clean division of authority between AgentCore (secrets, vault, callback URL) and DynamoDB (display/RBAC/scopes), well documented in module docstrings.
  • OAuthConsentHook elegantly covers both cold‑start and stale‑token paths via BeforeToolCallEvent + AfterToolCallEvent, with a per‑turn retry guard.
  • BroadcastChannel + postMessage dual path for popup → opener handoff correctly anticipates COOP severing window.opener after the popup traverses external origins.
  • Resume protocol is thoughtful: snapshotting the prior request, sending an empty message with interrupt_responses, bypassing quota/RAG/file resolution on resume.
  • Test coverage is meaningful: test_oauth_consent_hook.py (+419) and test_agentcore_identity.py (+182) exercise the consent paths.
  • Frontend SSE parser correctly allows oauth_required after message_stop/done.
  • IAM scoping in inference-api-stack.ts:490 properly constrains GetResourceOauth2Token to the token vault and workload‑identity directory.

Verdict

Request Changes — merge‑blocking on items #1 (verify complete_consent authorization binding, or add server‑side session tracking) and #2 (tighten auth‑failure regex). Remaining items safe as follow‑ups.

CI: mergeStateStatus: UNSTABLE; "Test Python Code" on App API and Inference API workflows were still in progress at review time — verify they pass before merging.

🤖 Generated with Claude Code

philmerrell and others added 2 commits April 22, 2026 17:46
…uth-failure regex

Hardens two gaps called out in review of the AgentCore OAuth flow.

- `/connectors/complete-consent` now verifies the submitted `session_uri`
  was issued to the authenticated user at `initiate_consent`, rejecting
  cross-user replay with 403 before ever calling AgentCore. Backed by a
  thread-safe TTL cache (10 min, single-use). Soft-fails with a warning
  when AgentCore's authorize URL doesn't carry a recognised session
  parameter, so an SDK shape change logs rather than blocks.
- `_AUTH_FAILURE_PATTERN` tightened with word boundaries on every clause
  and a non-path guard on `401` so tool errors containing `/v1/401/...`
  no longer trigger a spurious force-reauth.

Also moves `import boto3`/`os` out of the `complete_consent` handler
body and caches the control-plane client via `lru_cache`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…back

Addresses the remaining two critical items from PR #174 review.

Registrar response parsing (`_info_from_response`): fails loudly on
contract violations rather than silently storing empty strings. Missing
`clientSecretArn` still tolerated (some vendors won't persist one) but
a wrong-shape `clientSecretArn` or absent `credentialProviderArn` now
raises TypeError so an AgentCore API change surfaces as a real error.

Admin create-provider rollback (`_rollback_orphaned_provider`): now
retries the AgentCore delete twice with backoff before giving up.
On exhaustion, emits a CloudWatch `Agentcore/OAuth::ProviderOrphaned`
custom metric so ops can alarm on stranded credential providers.
Secondary failures (CW down, registrar down after retries) never
shadow the admin's original 5xx — they only log. The subsequent
create attempt that hits `CredentialProviderConflictError` with no
DB record now returns an actionable 409 pointing at the AWS CLI
cleanup command instead of a bare "already exists".

App API task role grants `cloudwatch:PutMetricData` scoped to the
`Agentcore/OAuth` namespace via a condition key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread backend/src/apis/inference_api/connectors/routes.py Fixed
Comment thread backend/src/apis/inference_api/connectors/routes.py Fixed
philmerrell and others added 3 commits April 22, 2026 21:00
- Reject non-https authorizationUrls at both intake and open time so a
  compromised backend can't smuggle javascript:/data: URIs into a user
  click.
- Replace window.location.href hijack on popup-block with a blocked
  signal; the banner renders an "Open in new tab" anchor instead of
  tearing down the chat tab.
- Reject resume requests whose interruptIds aren't present in the cached
  agent's _interrupt_state with 400, preventing silent acceptance after
  cache eviction, process restart, or forged payloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeQL flagged the provider_id interpolation as clear-text logging of
sensitive data — its taint analysis traces provider_id back through the
OAuth credential path. The provider ID itself isn't secret, but the log
line doesn't need it: tool_id already identifies the tool, and
"(OAuth)" alone confirms auth was wired up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… tests

Both tests codify behavior that commit b55653d intentionally retired:

- TestInvocationsOAuthRequired exercised drain_pending_consent and the
  route-level oauth_required emission path. That path is gone — consent
  URLs now flow through Strands' _interrupt_state inside
  agent.stream_async (stream_coordinator.py:543), and the hook behavior
  is covered by tests/agents/main_agent/session/test_oauth_consent_hook.py.
- test_missing_message_returns_422 expected message to be required, but
  InvocationRequest.message is now default "" so resume requests can
  reuse the original prompt from interrupt context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
philmerrell and others added 4 commits April 25, 2026 07:44
…1 detection

Fixes the constant Google re-auth bug: the consent hook was calling
AgentCore Identity with `callback_url=None` whenever the inference API
ran outside the Runtime proxy (every local-dev session). AgentCore then
issued an authorize URL whose redirect went somewhere other than
`/oauth-complete`, so consent never finalized and every request looped
back through the consent flow.

Adds a `CallbackUrlUnavailableError` and an `AGENTCORE_LOCAL_OAUTH_CALLBACK_URL`
env-var fallback in `_resolve_callback_url`, so the failure mode is now
loud instead of silent. Both the chat-triggered consent hook and the
settings-page `initiate-consent` route catch it and return 503 with
actionable guidance.

Also tightens the OAuth 401 detection regex to reduce false-positive
re-auth prompts: `\bunauthorized\b` now requires proximity to an
HTTP/status/code keyword (previously matched prose like "unauthorized
to view this calendar"), and adds high-confidence signals for OAuth
`invalid_grant` (refresh-token revocation) and Google's `UNAUTHENTICATED`
status / `invalid authentication credentials` message.

Drops the in-process `session_cache` defence-in-depth on
`complete-consent`: AgentCore's own `userIdentifier` ↔ `sessionUri`
binding already rejects mismatched completions, and the local cache
cost real operational pain (multi-worker / restart / `--reload` would
break legitimate consent flows with a confusing 403). Trust the
JWT-derived `current_user` plus AgentCore's binding instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several user-facing connector improvements that share a foundation
(per-user `force_reauth` lifecycle in the in-process token cache):

- New `GET /connectors/{id}/status`: side-effect-free read that the
  settings page uses to render a "Connected" badge without committing
  the user to a consent flow (initiate-consent always triggers a
  server-side pending session). Honors the `force_reauth` flag — a
  just-disconnected user is reported as not connected even if the vault
  still holds an unexpired token.

- New `DELETE /connectors/{id}/connection`: best-effort disconnect that
  flips the local `force_reauth` flag (AgentCore exposes no per-user
  vault-delete API). The next status check returns `connected: false`,
  the next initiate-consent passes `force_authentication=True`, and the
  user re-authorizes from scratch. complete-consent clears the flag on
  success so the UI flips back to connected without waiting on the agent
  loop to warm the cache.

- Frontend Disconnect button on connected rows. Confirmation dialog uses
  the existing `ConfirmationDialogComponent` (CDK Dialog, destructive
  styling) — also swapped the admin connector-list delete from native
  `confirm()` to the same component for visual consistency.

- Closed-popup recovery in `OAuthConsentService`: poll `popup.closed`
  after open and drop the provider from `inFlight` if the user dismisses
  without completing consent. The pending request stays so the chat
  banner re-offers Connect; the settings page resets `awaiting` →
  `idle` via the new `inFlightProviders` signal.

- Settings page: loading skeleton in the row's action area while the
  status probe resolves, dropped the misleading "Reconnect" button
  (clicking it just hit `initiate-consent` and toasted "already
  connected"), and removed the scope-list display under each connector.

- Forward Google's `access_type=offline` (per AgentCore Identity docs)
  via a new vendor-baseline helper, plumbed through both the
  chat-triggered consent hook and the settings/initiate-consent /
  status routes via two new optional lookups on `OAuthConsentHook`
  (`provider_type_lookup`, `custom_parameters_lookup`). Without this
  Google issues a 1-hour access token with no refresh path and the
  vault entry becomes unrefreshable.

- Admin-configurable `custom_parameters` field on the OAuth provider
  record (DynamoDB `customParameters` map, Pydantic Create/Update/
  Response, admin form `key=value` textarea with parse/serialize
  helpers). Merged with the vendor baseline at request time — baseline
  wins on conflict so admins cannot accidentally turn off documented
  vendor requirements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aceholders

Per the AgentCore Identity supported-providers docs, Slack, Salesforce,
and Zoom are first-class vendors with pre-configured endpoints — admins
only need to supply credentials. Verified the exact `credentialProviderVendor`
strings and `oauth2ProviderConfigInput` keys against the SDK shape
(`Oauth2ProviderConfigInput.members`):

  - Slack       → SlackOauth2      / slackOauth2ProviderConfig
  - Salesforce  → SalesforceOauth2 / salesforceOauth2ProviderConfig
  - Zoom        → ZoomOauth2       / includedOauth2ProviderConfig
                                     (shared key for simpler vendors)

Backend additions: `SLACK`, `SALESFORCE`, `ZOOM` on `OAuthProviderType`;
vendor + config-key entries on the registrar. The existing discovery-URL
guard correctly rejects discovery URLs for these new types.

Frontend additions: matching `ConnectorType` literals; preset entries
with sensible default scopes and vendor-relevant placeholder hints (e.g.
Salesforce `api, refresh_token, offline_access, id, openid`); icon
class branches for the new tiles (Slack fuchsia + chat bubble,
Salesforce sky + cloud, Zoom blue + video camera).

Form polish:

- `scopesPlaceholder` / `customParametersPlaceholder` on each preset.
  Form binds them via computed signals so the hints update as the admin
  switches between providers.
- Selecting a preset seeds `customParameters` only when the preset
  declares `defaultCustomParameters` — avoids clobbering user-typed
  content for presets that have only a hint.
- Dropped the Google `defaultScopes`. The OIDC-only
  `openid email profile` set doesn't actually let an agent do anything
  useful with Google APIs (Calendar/Gmail/Drive each need different
  scopes), so the form lands empty and the placeholder shows the URL
  format as a hint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
logger.info(
"Completed OAuth consent for user=%s provider=%s",
current_user.user_id,
body.provider_id,
philmerrell and others added 3 commits April 25, 2026 13:22
…store

Replaces the floating OAuth banner with an inline prompt anchored to the
assistant turn that triggered consent, and persists pending interrupts to
session metadata so a browser refresh rediscovers them instead of leaving
the tool call orphaned in `pending` forever.

Backend
- New `PendingInterrupt` model on `apis.shared.sessions.models`; included
  on `MessagesListResponse` and `SessionMetadata`.
- `metadata.add_pending_interrupt` / `remove_pending_interrupts` /
  `get_pending_interrupts` helpers using GSI lookup + targeted UpdateExpression.
- `StreamCoordinator._extract_oauth_required_events` is now async and
  persists each interrupt before yielding the SSE event; failures log but
  never break the live stream.
- `get_messages_from_cloud` fetches pending interrupts in parallel.
- `/invocations` resume path clears resolved interrupts from metadata
  after `agent.stream_async` completes.
- New `DELETE /sessions/{sid}/pending-interrupts/{iid:path}` endpoint
  for explicit dismiss; colon-bearing Strands ids preserved via `:path`.

Frontend
- New `OAuthConsentPromptComponent` with a refined inline card design,
  connector icon (admin base64 wins over heroicon, falls back to
  providerType default), eyebrow/lock motif, primary gradient action
  button, hover-revealed dismiss, fade+slide entrance.
- `MessageMapService.loadMessagesForSession` hydrates pending interrupts
  on session load; anchors to triggering message id when present, else
  the most recent assistant message.
- `OAuthConsentService.openConsentPopup` is async; lazy-fetches a fresh
  authorization URL via `initiate-consent` when the stored one is absent
  or expired (handles "already consented in another tab" by auto-resuming).
- `OAuthConsentService.dismiss` syncs to backend by default; completion
  flow opts out so the resume path's own cleanup isn't double-fired.
- `MessageListComponent` renders unanchored interrupts at end-of-list as
  a fallback for the "partial assistant message wasn't persisted" case.
- `awaiting_auth` derived tool status renders as a primary-blue ring on
  the tool-rail dot instead of an indefinite amber spinner.
- `ChatRequestService.resumeFromOAuthConsent` accepts a fallback session
  id (post-refresh case where `lastRequestObject` is null) and surfaces
  400 `Unknown or expired interrupt ids` as a conversational error.
- Old floating `OAuthConsentBannerComponent` removed.

Known follow-up
- First-turn-of-a-new-session OAuth: persistence currently no-ops because
  the session metadata row doesn't exist yet when the interrupt fires.
  Tracked separately; sidecar item or upsert pattern is the likely fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
philmerrell and others added 3 commits April 25, 2026 21:28
- Add functions to ensure session metadata existence and update session title and activity.
- Implement logic for handling session activity updates, including message count increments and preferences merging.
- Introduce deduplication for pending interrupts to prevent duplicate entries during session updates.
- Update frontend components to reflect changes in session management, including OAuth consent prompts and message handling.
- Refactor session service interfaces to use camelCase for consistency with backend responses.
- Enhance tests for session activity updates, pending interrupts, and ensure proper handling of session metadata.
Resume after an OAuth-gated tool call only worked when the in-memory
agent cache still held the original turn. After a browser refresh the
frontend lost its request snapshot and the resume request landed with
no enabled_tools / model_id, so the inference API rebuilt a fresh agent
with an empty external-tool registry — the paused tool call had nothing
to resume against and the LLM responded that the tool wasn't available.

Resume contract now lives server-side. On pause, the stream coordinator
captures a ``PausedTurnSnapshot`` (enabled_tools, model_id, provider,
temperature, system_prompt, caching_enabled, max_tokens) onto the
session row alongside the existing ``pendingInterrupts``. On resume,
the inference API loads the snapshot and rebuilds the agent from it;
Strands' SessionManager then restores ``_interrupt_state`` from
AgentCore Memory, so the paused tool call picks up where it left off
regardless of cache hit/miss, refresh, or pod restart.

Frontend ``lastRequestObject`` snapshotting is gone — the resume
payload is now ``{ session_id, message: '', interrupt_responses }``.
Server-side snapshot has a 1h TTL; cleared on full turn completion
and at the start of any new (non-resume) turn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n't fail a turn

Previously, ``load_external_tools`` cached newly-created MCP clients
without verifying the server was actually reachable. A single connector
that wasn't running locally (or whose endpoint was misconfigured) would
sit in the registry and fail the whole turn the first time Strands
called ``load_tools()`` on it.

Pre-flight each new client immediately after construction. On failure,
log a warning, skip the tool, and continue — the user keeps their other
tools. On success the call also primes the client's tool cache, so
Strands' later ``load_tools()`` becomes a no-op.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
"""
user_id = current_user.user_id

logger.info("DELETE /sessions/%s/pending-interrupts/%s", session_id, interrupt_id)
"""
user_id = current_user.user_id

logger.info("DELETE /sessions/%s/pending-interrupts/%s", session_id, interrupt_id)
if not snapshot:
logger.warning(
"Resume rejected: no paused_turn snapshot for session %s",
input_data.session_id,
if expires_at and datetime.now(timezone.utc) > expires_at:
logger.warning(
"Resume rejected: paused_turn snapshot expired for session %s",
input_data.session_id,
detail="Paused turn expired; restart the turn.",
)

caching_enabled = snapshot.caching_enabled
philmerrell and others added 6 commits April 26, 2026 15:52
ensure_session_metadata_exists() now runs unconditionally on /invocations
and raises when DYNAMODB_SESSIONS_METADATA_TABLE_NAME is unset, breaking
route tests that mock the agent and skip DynamoDB. Stub it via an autouse
fixture so route tests exercise the route, not the persistence layer. Also
patch the new get_pending_interrupts call in the cloud-message tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The disconnect flag lived in a module-level set inside the inference API
process, so a /disconnect on one replica was invisible to any other.
Under multi-replica deploys the user could see "Connected" on one
request and "needs consent" on the next, and the AfterToolCallEvent
401-retry path likewise lost its intent on replica fan-out.

Move the per-(user, provider) disconnect flag to a new
OAuthDisconnectRepository on the existing oauth-user-tokens DynamoDB
table (already provisioned, KMS-encrypted, with R/W IAM granted to the
inference API). The token cache stays as a hot-path L1 for tokens only;
the consent hook reads the disconnect repo on every BeforeToolCallEvent
so a disconnect anywhere is honored on the next tool run anywhere.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…wlist

The frontend posts an `OAuth2CallbackUrl` header on every consent-related
request, and the inference-api middleware was forwarding it verbatim into
`BedrockAgentCoreContext`. An authenticated user could pivot the OAuth
redirect to an attacker-controlled origin and capture the authorization
code on consent. Reuse `CORS_ORIGINS` as the trust boundary, pin the
path to `/oauth-complete`, and reject non-http(s) schemes, query strings,
and fragments.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A misconfigured provider (wrong scope, perma-401) would otherwise spawn
a fresh consent prompt on every tool call in a turn: the per-tool-use
retry guard reset for each new toolUseId, so the model could trigger
prompt-after-prompt with no upper bound. Track attempted providers on
the hook itself, reset on `BeforeInvocationEvent` (fires per turn,
including resume), so the user sees at most one consent prompt per
provider per turn before 401s flow through to the model.

Also clarify the `event.interrupt(name="oauth:{provider_id}")` comment:
the SDK's BeforeToolCallEvent._interrupt_id folds in `toolUseId`, so
parallel tool calls to the same provider already produce distinct
interrupt ids. New regression test pins that invariant.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A stream replay after refresh, or a late server-side breadcrumb clear,
could fire the same `oauth_required` event again after a successful
consent or explicit dismissal — and the prompt would resurrect because
provider-keyed dedup re-added the entry. Track seen interrupt ids on
the consent service so already-resolved interrupts stay gone for the
session. New tool calls always carry a fresh interrupt id (Strands
generates it from `toolUseId`), so legitimate prompts are never
suppressed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
return
result = self._mark_disconnected(provider_id)
if inspect.isawaitable(result):
await result
… stack

The referenced tables live in InfrastructureStack (moved there to break a
prior circular dep); update 9 SSM-read comments to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@philmerrell philmerrell merged commit d04f808 into develop Apr 27, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants